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1. INTRODUCTION 

Webster's Third New International Dictionary defines an index 
as: "a usually alphabetical list that includes all or nearly all 
items (as topics, names of people and places) considered of special 
pertinence and fully or partially covered or merely mentioned in a 
printed or written work (as. a book, catalog, or dissertation)..." 

Subject indexing is conventionally an intellectual effort that 
relies on knowledge of a subject area and the use of a structured 
vocabulary from which to choose index terms (e.g. glossary, thes- 
aurus, classification scheme , etc .) . Thus, manual subject index- 
ing requiring training and ability, results in a time lag between 
receipt of a document and its subsequent availability through an 
index or catalog. 

Concordances utilized an indexing technique which required only 
that significant words in the text of a document be identified and 
arranged in an alphabetical array as the index to the document. 
H. P. Luhn was the first to utilize this technique with the aid of 
a computer to produce KWIC (keyword-in-context) indexes to docu- 
ments. The title of a document and an ID code are the input; the 
output is a series of permuted titles, one for each "keyword" (ar- 
ticles, conjunctions, etc. are suppressed). 

The KWOC (keyword-out-of -context) index is a variation of the 
KWIC index and was developed at Stanford Research Institute. 

The variations, advantages, disadvantages, and myriad appli- 
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cations of KWIC-type indexes can be explored in other publications. ' 

Some advantages can be cited briefly here: 

No content analysis is necessary, all indexing is fully auto- 
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matic, thus cutting indexing time and costs. Index terms are di- 
rectly representative of the author's terminology and thus are not 
constrained by obsolete and restrictive indexing schemes. KWIC- 
type indexes provide good depth of indexing using an automatic 
method (e.g. each title would be indexed by as many keywords as it 
contains) thereby simultaneously providing a coordinate index of 
title "keywords" and a crude associative index. The resultant in- 
dex is simple to use and requires no training. 

2. PROGRAM DESCRIPTION 

A KWOC processor is available in the OS-3 operating system. 
The program is filed under the name *KWOC. This program may be 
used to generate a "permuted index" to a set of document titles, 
or, more generally, to a set of variable length records stored in 
some file. Output from the program is a file that consists of an 
alphabetically ordered set of keywords, each of which is followed 
immediately by an alphabetically ordered list of all records con- 
taining these keywords. A condensed list of keywords is also out- 
put. Since not all words in a title are information-bearing key- 
words (e.g. and, the, etc.), *KWOC also accepts as input a file 
of words that are ignored during indexing. 

Although *KWOC is extensively parameterized, default options 
do exist which satisfy most cases. The program is called by the 
control statement *KWOC followed by a string of parameters. Par- 
ameters that may be used are as follows: 



3. 
I or Input = file name or logical unit (lun) of input file. 

If no file exists under the given name or if the 
specified lun is not equipped, the program will 
abort. If the I parameter is not present in the 
parameter string, lun 60 is assumed. 

or = file name or lun of output file. If no file 
Output exists under the given name, one will be created; 
if no lun exists, one will be equipped. Format 
of the output file is: KWOC Index, EOF, Keyword 
List, EOF. If the parameter is not present in 
the parameter string, lun 61 is assumed. 

s or = file name or lun of a file that contains any 
Suppress words that should not be used as keywords. This 
file should contain one such word per record 
with a file mark at the end of the file. A file 
of about 150 commonly suppressed words is avail- 
able under the name *SUPPRES. If the S parameter 
is not present, all words will be treated as key- 
words . 

L = the maximum line length, in characters, of output 

records. Since each record is indented with 
respect to the keyword (see examples), the length 
of each output record will be twelve characters 
greater than the declared line size, i.e. twelve 
leading blanks are inserted in each line. If 
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line size is not specified as a parameter, out- 
put records will have a maxiumum length of 110 
characters (a convenient size for printed output). 

B or = An internal BCD code used to break an output line 
Break if the output record length exceeds the current 

line size. Output will then occur in two or more 
consecutive lines. A table of characters and 
their internal BCD codes is given in Appendix I. 
If the value of B exceeds (100) R , the line will 
be broken exactly at its maximum length. Other- 
wise the line will be scanned from ri^ht to left 
for the first occurrence of the specified BCD 
character, and the line broken at that point. 
The break character itself will not appear in 
the output. If the break character is not present 
in a line, the line will be broken at its maximum 
length. If the Break parameter is not present 
in the parameter string, the program will break 
oversized lines at a space = (60 R ) . 

X = delimiting character, not an alphanumeric, where 

scanning is to commence in each input record. 
X is given as an internal BCD code. If no X 
is given in the parameter string, scanning will 
begin at the first character of each input record. 



Y = delimiting character, not an alphanumeric, where 
scanning is to stop in each input record. Y is 
an internal BCD code. If no Y is specified, 
scanning will continue until the end of every 
input record. 

T or = file name or logical unit of a file which con- 
Table tains a table used to define those character 

strings that are to be construed as words. For 
example, the program in normal operation will 
never try to use a comma as a keyword, and any 
keyword followed by a comma is, in effect, delim- 
ited by that comma. In particular, a word is de- 
fined as an alphabetic character followed by a 
string of consecutive alphabetic or numeric char- 
acters. Thus, "CDC3300" is a word but "3300CDC" 
is not. All characters that are not alphabetics 
or numerics are delimiters except a space, which 
is a delimiter of a special sort. The Table 
parameter allows the user to redefine the set 
of recognized alphabetic characters. The class 
of any character may be changed to an alphabe- 
tic with the exception of the space. If T is 
used, then any characters that are not declared 
to be alphabetic become delimiters, except num- 
bers. Numbers remain as numbers unless they 
were redeclared to be alphabetics. The format 



of the file specified by the T parameter should 
be one character per record with a file mark 
at the end of the file. If the T parameter is 
not present in the parameter string, then the 
set of characters used as alphabetics consists 
of the 2 6 letters of the alphabet together with 
the dash (-) . 

3. CONSTRAINTS 

The maximum size of a keyword is forty characters. Any key- 
word that exceeds this length will be truncated. In addition, the 
maximum size of an input record is 1000 characters. 

The program uses various logical units as scratch files. These 
are in the range 50-59. If the program requires a lun, it will 
unequip any logical unit in this range found to be equipped. 

The program is highly space consuming. This can easily be 
seen since if an input record contains ten keywords, then that 
input record will occur ten times in the output file. In addition, 
scratch space of at least twice the size of the output file is 
required internally in order to sort the output file. Since most 
of the time required to produce a large index is spent sorting the 
output file into alphabetic order, the user is referred to the 
SORT/MERGE manual (cc-68-37) for timing considerations. 



4. EXAMPLES 

The first example will deal with a file named TEST: 

KWOC INDEXING, ITS USES AND ABUSES 
LITTLE KNOWN FACTS ABOUT KWOC INDEXING 
MARY HAD A LITTLE LAMB 

The KWOC index for this file will utilize 3 parameters: In- 
put, Output and Line Size; lines break on a blank; no terms are 
suppressed; scanning begins with the first term and continues 
through the end of line, since neither X nor Y is specified; and, 
since the parameter T is not referenced, words are defined as 
character strings beginning with the characters A through Z and 
the dash, and followed by the characters A through Z, the dash, 
or the numbers through 9. 

The KWOC parameter string will look like this: 

*KWOC , I=TEST , 0=EXAMPLE , L=3 5 

The output, EXAMPLE, follows: 
A 



ABOUT 



ABUSES 



AND 



FACTS 



HAD 



MARY HAD A LITTLE LAMB 



LITTLE KNOWN FACTS ABOUT KWOC 
INDEXING 



KWOC INDEXING, ITS USES AND ABUSES 



KWOC INDEXING, ITS USES AND ABUSES 



LITTLE KNOWN FACTS ABOUT KWOC 
INDEXING 



MARY HAD A LITTLE LAMB 



INDEXING 



ITS 



KNOWN 



KWOC 



LAMB 



LITTLE 



MARY 



USES 



KWOC INDEXING, ITS USES AND ABUSES 



LITTLE KNOWN FACTS ABOUT KWOC 
INDEXING 



KWOC INDEXING, ITS USES AND ABUSES 



LITTLE KNOWN FACTS ABOUT KWOC 
INDEXING 



KWOC INDEXING, ITS USES AND ABUSES 



LITTLE KNOWN FACTS ABOUT KWOC 
INDEXING 



1 V 1AKX HAJJ A J-ilXTJ-it. L,Al v lb 



LITTLE KNOWN FACTS ABOUT KWOC 
INDEXING 

MARY HAD A LITTLE LAMB 



MARY HAD A LITTLE LAMB 



KWOC INDEXING, ITS USES AND ABUSES 



A 

ABOUT 

ABUSES 

AND 

FACTS 

HAD 

INDEXING 

ITS 

KNOWN 

KWOC 

LAMB 

LITTLE 

MARY 

USES 
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It can be seen from the first example that many "keywords" 
are not useful and it would be desirable to have these words sup- 
pressed. A sample of common terms to be suppressed which are con- 
tained in the file *SUPPRES follows: 

A 

ABOUT + 

AD 

ALL 

AMONG 

AN 

AND 

ANOTHER 

ARE 

AS 

AT 

BASED 

BE 

BETWEEN 

BY 



By adding the S parameter, 

( *KWOC , I=TEST , 0=XAMPLE , L=5 , S=*SUPPRES ) 

the following output is the result: 

ABUSES 

KWOC INDEXING, ITS USES AND ABUSES 
FACTS 

LITTLE KNOWN FACTS ABOUT KWOC 

INDEXING 
HAD 

MARY HAD A LITTLE LAMB 
INDEXING 

KWOC INDEXING, ITS USES AND ABUSES 



KNOWN 
KWOC 



LITTLE KNOWN FACTS ABOUT KWOC 
INDEXING 

LITTLE KNOWN FACTS ABOUT KWOC 
INDEXING 

KWOC INDEXING, ITS USES AND ABUSES 

LITTLE KNOWN FACTS ABOUT KWOC 
INDEXING 
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LAMB 
LITTLE 



MARY 
USES 



MARY HAD A LITTLE LAMB 

LITTLE KNOWN FACTS ABOUT KWOC 
INDEXING 

MARY HAD A LITTLE LAMB 



MARY HAD A LITTLE LAMB 

KWOC INDEXING, ITS USES AND ABUSES 



ABUSES 

FACTS 

HAD 

INDEXING 

KNOWN 

KWOC 

LAMB 

LITTLE 

MARY 

USES 

One can create a suppression list through COPY, *TVCOPY f 
*TVE or EDIT by inputting the desired words, one word per input 
record. It is more likely that one would want to expand the 
list of words found in *SUPPRES. For example, one might want the 
words HAD, KNOWN, LITTLE and USES suppressed, in addition to A, 
ABOUT, AND and ITS in the file TEST. The file *SUPPRES can be 
copied, and through the INSERT or APPEND commands in EDIT, these 
additional terms can be added. 

Another file to be used for input, which contains citations 
for 3 technical reports and one book is LBJ: 

BLACKWELL, FREDERICK W. -ON-LINE COMPUTER SYMBOLIC MANIPULATION. 1 
$1966$ =QA76-B55= 

BORKO, HAROLD. -THE BOLD (BIBLIOGRAPHIC ON-LINE DISPLAY) SYSTEM. 1 
$1967$ =AD-632473= 

KELLOGG, C.H. -ON-LINE TRANSLATION OF NATURAL LANGUAGE QUESTIONS 
INTO ARTIFICIAL LANGUAGE QUERIES.- $1967$ =AD-643494= 

BORKO, HAROLD, ET AL. ^ON-LINE INFORMATION RETRIEVAL USING 
ASSOCIATIVE INDEXING.- $1968$ =AD-670195= 
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Assuming you wish only the title portion to be scanned in 
your index, and, in no case should the call number or report num- 
ber be broken up in the resultant index, the parameter string 
would look like this: 

*KWOC , I=LBJ, S=HHH , L=4 , X=54 , Y=54 , B=l 3 , 0=RMN 

(Where HHH is a suppression file you have created, line size= 
40 characters, scanning begins with data following the first as- 
terisk (BCD code 54) and ends with the next asterisk encountered, 
the break is on the equal sign (BCD code 13), and your output 
file is RMN.) 

RMN will appear as follows: 

ARTIFICIAL 

KELLOGG, C.H. "ON-LINE TRANSLATION OF NA 
TURAL LANGUAGE QUESTIONS INTO ARTIFICIAL 
LANGUAGE QUERIES.* $1967$ =AD-64349M- = 

ASSOCIATIVE 

BORKO, HAROLD, ET AL. *0N-LINE INFORMATI 
ON RETRIEVAL USING ASSOCIATIVE INDEXING. 
* $1968$ =AD-670195= 

BIBLIOGRAPHIC 

BORKO, HAROLD. *THE BOLD (BIBLIOGRAPHIC 
ON-LINE DISPLAY) SYSTEM.* $1967$ = 
AD-632473= 

BOLD 

BORKO, HAROLD. *THE BOLD (BIBLIOGRAPHIC 
ON-LINE DISPLAY) SYSTEM.* $1967$ = 
AD-632473= 

COMPUTER 

BLACKWELL, FREDERICK W. *0N-LINE COMPUTE 
R SYMBOLIC MANIPULATION.* $1966$ = 
QA76-B55= 

DISPLAY 

BORKO, HAROLD. *THE BOLD (BIBLIOGRAPHIC 
ON-LINE DISPLAY) SYSTEM.* $1967$ = 
AD-632473= 
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INDEXING 

BORKO, HAROLD, ET AL. ^ON-LINE INFORMATI 
ON RETRIEVAL USING ASSOCIATIVE INDEXING. 

* $1968$ =AD-670195= 

INFORMATION 

BORKO , HAROLD, ET AL. -ON-LINE INFORMATI 
ON RETRIEVAL USING ASSOCIATIVE INDEXING. 

* $1968$ =AD-670195= 

LANGUAGE 

KELLOGG, C.H. -ON-LINE TRANSLATION OF NA 
TURAL ; LANGUAGE QUESTIONS INTO. ARTIFICIAL 
LANGUAGE QUERIES.- $1967$ =AD-64349i+ = 

MANIPULATION 

BLACKWELL, FREDERICK W. -ON-LINE COMPUTE 
R SYMBOLIC MANIPULATION.- $19 6 6$ = 
QA76-B55- 



NATURAL 



KELLOGG, C.H. ^ON-LINE TRANSLATION OF NA 
TURAL LANGUAGE QUESTIONS INTO ARTIFICIAL 



ON-LINE 



BLACKWELL, FREDERICK W. -ON-LINE COMPUTE 
R SYMBOLIC MANIPULATION.* $19 66$ = 
QA76-B55= 

BORKO, HAROLD. *THE BOLD (BIBLIOGRAPHIC 
ON-LINE DISPLAY) SYSTEM.- $1967$ = 
AD-632473= 



QUERIES 



BORKO, HAROLD, ET AL. ^ON-LINE INFORMATI 
ON RETRIEVAL USING ASSOCIATIVE INDEXING. 
- $1968$ =AD-670195= 

KELLOGG, C.H. *ON-LINE TRANSLATION OF NA 
TURAL LANGUAGE QUESTIONS INTO ARTIFICIAL 
LANGUAGE QUERIES.* $1967$ =AD-643494= 



KELLOGG, C.H. *ON-LINE TRANSLATION OF NA 
TURAL LANGUAGE QUESTIONS INTO ARTIFICIAL 
LANGUAGE QUERIES.* $1967$ =AD-6H349H= 



QUESTIONS 

KELLOGG, C.H. *0N-LINE TRANSLATION OF NA 
TURAL LANGUAGE QUESTIONS INTO ARTIFICIAL 
LANGUAGE QUERIES.* $1967$ =AD-643494= 
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RETRIEVAL 

BORKO, HAROLD, ET AL. -ON-LINE INFORMATI 
ON RETRIEVAL USING ASSOCIATIVE INDEXING. 
* $1968$ =AD-670195= 



SYMBOLIC 

BLACKWELL, FREDERICK W. -ON-LINE COMPUTE 
R SYMBOLIC MANIPULATION** $19 6 8$ = 
QA76-B55= 



SYSTEM 



BORKO, HAROLD. *THE BOLD (BIBLIOGRAPHIC 
ON-LINE DISPLAY) SYSTEM.* $1967$ = 
AD-632473= 



TRANSLATION 

KELLOGG, C.H. *ON-LINE TRANSLATION OF NA 
TURAL LANGUAGE QUESTIONS INTO ARTIFICIAL 
LANGUAGE QUERIES.* $1967$ =AD-643494= 



ARTIFICIAL 

ASSOCIATIVE 

BIBLIOGRAPHIC 

BOLD 

COMPUTER 

DISPLAY 

INDEXING 

INFORMATION 

LANGUAGE 

MANIPULATION 

NATURAL 

ON-LINE 

QUERIES 

QUESTIONS 

RETRIEVAL 

SYMBOLIC 

SYSTEM 

TRANSLATION 
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The pre-coordination (or "linking") of terms, concepts and 
names can be accomplished by treating multiple words as a char- 
acter string. For example, in file LBJ, if scanning begins with 
the beginning of each record, the authors name(s) will be picked 
up in addition to title words. It would be desirable to have the 
complete name kept together in a lengthy list (e.g. SMITH, JOHN- J. 
or SMITH, J. J. instead of SMITH) for sorting purpose, and will also 
prevent a printout of JOHN and J separately. If table T is created 
to include the letters A-Z, the hyphen (-) , the period (.) and 
the comma (,), the name Smith, J.J. will be recognized as one 
character string, and will print out as one keyword. 

Likewise, terms considered concepts (i.e. information-retrie- 
val, heat-transfer) and other types of names (e.g. corporate names: 
Control-Data-Corp. , or journal names: SOFTWARE-AGE) will provide 
a more meaningful and economical index if kept together. Only 
a small amount of extra preparation on input is required. 



Stevens, Mary Elizabeth. Automatic Indexing: A State-of- 
the-Art Report . N.B.S. Monograph 91. 1965. p. 48. 

2 

Borko, Harold, ed. Automated Language Processing: the 

State of the Art . Wiley: 1967. 

3 
Fischer, Marguerite. The KWIC Index Concept: A Retrospective 

View. AMERICAN DOCUMENTATION 17: 57-70, April 1966. 



Appendix I 
Delimiters available on the CRT, teletype, and keypunch and their BCD Codes 



colon 

equal 

ampersend 

percent 

left bracket 

plus 

less than 

period 

right paren 

sharp 

quotation mark 

semicolon 

exclamation 

dollar sign 

asterisk 

North arrow 

more than 

blank 

slash 

right bracket 

comma 

left paren 

question mark 



Characters 
Keypunch 



+ 



$ 



space 

/ 



Teletype 



% 

C 

+ 
< 

) 

# 



$ 
# 

f 
> 

space 

/ 

3 

< 

9 



BCD Code 

12 

13 

15 

16 

17 

20 

32 

33 

34 

35 

36 

37 

52 

53 

54 

55 

57 

60 

61 

72 

73 

74 

77 



Characters 
CRT 



^ less than or equal to 
% 

c 

) 

^ more than or equal to 

- (carriage return) 

+ plus or minus 

V upside down caret 

t 
> 

space 

/ 
3 



A 



caret 



The BCD codes 13, 33, 34, 53, 54, 60, 61, 73, and 74 are the compatible codes 
on all 3 input devices. 



*KWOC Overview 

^^^p '*~ = ■F-'le named *KWOC as an overlay, 
The KWOC program, stored i±* a ix^e nameu 

consists of: 

1. KW0C1: KWOC and PARPROC 

and 3. KW0C2 

PARPROC accepts the P-ameter string ^-^the^ser^nd 

^wSc! ar |roc e rLds ;arTa^e C length input records from the^un 
Contained in the word UNITIN and writes output records on .^OU. 
(equ 50). Format of each output record is: 



nput Record 




K&YLENG equ -> 



Variable length *£ CBUFFLNG equ 250 



Procrram then reads in *s>ukta k& ^^^^ \ ^"^"-^i; """ 
as an overlay and output records are sorted by keyword, then 
by the rest of the record. 

= ii r-AcArds are processed in this way, a file mark is written 
on (S-NITOU?) followed by the list of keywords that were used 

and a second file mark. 



Transfer of Parameters, In put, and Output During. Execution 

Parameter String 

P 
4' p 

PARPRO c > TABLE 

I c 

<t p 



(UNITIN) 



I C = Control i 
! p -' Parameters 
| = Output 
I I = Input 



'^> KWOC <=r 

t 
(continued) 



2. 



TEMPQUT 
equ 50 




KWOCSORT 
equ 51 



P ARAMS 
equ 52 






suRTwux 
equ 53 



>\ 




(UNITOUT) 



; SORTX 



) 
■ «/ 
^ *KWOC2 

^-^ 






KWOC Operation 

1. Equips luxx's 50-54 as scratch files with *SORTX on 54. 

2 Changes Break, Begncode, and ENOCode back to .octal. 

They were assumed to be decimal numbers by PARPROC which 
is lun oriented, but were really BCD codes. 



3. Writes contents of TABLE on PARAMS to be used by *KWOC2. 
4 



Reads in list of suppressed words from (SUPPRESS) and 
stores them in SLIST. If (SUPPRESS) contained a not 
accessed bit, this operation is ignored. Format of SLIST is 
<WORD> | 00000000 | <WORD> . . | 00000000 | 

Word Boundary^ 
Length of SLIST stored in SLLENG. 

Read a record into CARDBUFF and set scanner to start 
scanning from beginning of that buffer. 

Set jump exits for a special character where scanning 
is to begin or end, as specified by parameters B&GNCODE 

and ENDCODE. 



3. 



7. Scan off a symbol into ACCRUBUF. 

8. Branch on type of symbol contained in TYPE. Types are 

5 = alphabetic string 
11 = numeric string 
BCD = BCD code of any special character 
is its type . 

5 + 9} 
■ 11 ■*■!) 

Other +7) 

If a BEGNCODE was specified, the program scans each 
symbol in turn, but no further processing occurs until 
TYPE = (BEGNCODE) 

If an ENDCODE was specified, the program reads a new 
record when TYPE = '(ENDCODE) . 

9 . Check if word contained in SLIST 

yes -»■ 7) 
no -*• 10) 



lu 






write ^Aucj*u.&ur; 






11. 



-7) 



All parameters are taken from TABLE. 
parameters are 



In the order given there, 



UNITIN 

UNITOUT 

SUPPRESS 

LINESIZE 
BREAK 



BEGNCODE 
ENDCODE 



lun of input file 
lun of output file 

lun of words not to be used as keywords . 
Format of this file is one word. per record. 
Maximum length, in characters, of output records. 
BCD code where an output record is to be split 
if record length exceeds L. If (BREAK) > (77) 8 
the line will be broken exactly at L characters . 
BCD code of character .in record -that will cause 
keyword searching to begin. 
BCD code of character in record that will 
cause keyword searching to terminate. 



Operation of Scanner 

The SCANNER is a three state processor that is used to scan and 
accumulate symbol strings from the input record stored in 
CARDBUFF. Strings are" of three types: 

1. ■ <alphanumeric> : :=<alphabetio<alphanumeric> 

2 . <numeric> : : = <number> | < number xnumerio 

3. <other>: :=any other characters except space comprises 

a one character string 



■4. 

The SCANNER exits with the string in ACCRU3UF, the string 
length in LENGTH, and the string type in TYPE. 

Directories and tables are: 

ATABLE - a table of actions that point to the following 

a x : ACCFET - accrue character and fetch next character • 

a 2 : CHR0UT - exit from scanner with special character. 

a 3 : SKIP - Ignore current character in window. 

ai+ : ACC0UT - exit from scanner with character string. 
STABLE - a table for computing the next state. States are 

S Q : initial state 

S,: stacking an alphanumeric symbol 

S«: stacking a numeric string 
CTABLE - a table that translates a BCD character to its 

character class. Classes are 

- alphabetic 

.1 - special character 

2 - decimal digit 

3 - ignore <^> 

Counters and lists are 

NSP the next state pointer, used to compute the next state. 

WINDOW hold character that was input at time t. 

CLASS holds class of character in window. 

STATE the current state indicator 

ACCRUE a buffer used to hold the character string being 

accrued. 
LENGTH the length of string in the ACCRUE buffer. 
TYPE the type of string in the ACCRUE buffer. If string 

is a special character, then type = char. 
FCHAR ' a pointer on the character string being scanned. 

Method of operation can be described as follows : As in a 
finite state machine, the behavior of the scanner 'is determined by 
S t -F(S t . r I t ) 

v- G(s t ; i t ) 

where S is the state at time = t, 

a. is the action at time = t, 

t ■ 

and I is the input symbol at time = t. 

Character classes: 

= alphabetic 

1 = special character 

2 = number 

3 = ignore 

States: 

S Q = initial state 



S, = stacking alphanumeric symbol 
S^ = stacking a numeric string 



Actions: 

ai = accrue and fetch 
, a 2 = character out 
a 3 = skip 
a^ = accrue buffer oui 




3/a 3 



Operation of PARPROC 

PARPROC is a parameter processor used to process the user's 
parameter string. Parameters are placed in a table name TABLE. 
Each parameter' entry in TABLE contains the following information: 



bits 



2 





Ascii Char 



Default 



L_! 



| >NAME/LUN 
i —> Abort 
— > Define 



J ! S> Litteral 
j ^-* Accessed 
< — > Destruct 



6. 



Define 


1: 




0: 


Abort 


1: 




0: 


NAME/LUN 


1: 




0: 


Destruct 


1: 




"' 0: 


Accessed 


1: 



Litteral 



u : 



i . 



0: 



Equip lun or save name if parameter not 
already equipped or saved. 

Don't equip; don't save. 

Abort if lun not equipped or if name not 
saved and if define = 0, 

Don't abort. 

Parameter is a file name. Lower nine bits of 
TABLE word used as a pointer to two word name 
in NAMETBL. 

Parameter is a lun. 

Unequip lun after run. 

Don ' t unequip . lun . 

Parameter not supplied by user. Default 
value present in word. 

Parameter supplied by user. Default value 
erased. 

Parameter is a litteral; don't treat it as 
a file. . 

Parameter is a lun or file name. 



Control proceeds as follows: 

1. Read a character. If CR-6) 

2 . Test if parameter letter 

Yes+3 

No+1 

3. ■ Look for equal sign 

No+3 
Yes+4 



4. Read next characters. If alphabetic string, store in 
NAMETBL. If numeric string, store lower 9 bits of 
TABLE parameter word. " 

5. Process all parameters as directed by bits in TABLE 
parameter word. 

The contents of. TABLE may be changed with the TABLE creation 
macro, CREATE. 



Operation of SORTX 

- -~ innnnv -i <= * mnr'ified version of *SORT. 
SORTX f storea m *DORiX, is a moaiiieu vcioa.w 

Modifications are 

1. PARUNIT has been changed to default to KWOCSORT (equ 51 ) 
for its parameters, 

2. The program no longer writes its parameters on lun 61. 

3. The exit has been changed to read in *KW0C2 as an 
overlay and then jump to it. 

Operation of POSTKWOC 

Program stored as an overlay in file *KW0C2 ^* s ^f 
output from SORTOUT (equ 53) and TABLE created by *KW0C1 .roxa . 
PARAMS (equ 52) . 

A simple loop is used to output the sorted records: 

. ± Read Record 

Test if keyword same 
as previous record 



Yes / \ ^° 

/ OUTPUT keyword 

v if 

Output Record 



All output is. directed to UNITOUT. 

£?& record 11 |en written If a BHK.K - -cte.^s^ven , 

IT IIIakXAcI™ If sucSa cna?acter is found, the output 
^=ornimSisned e ;ppropriateXy. If no BREAK character x. 
discovered, the line is equal to LINESIZE. 

Each time a keyword is output to UNITOUT by POST KWOC that 
keyword is also written on KWLIST (equ 54). At the end of tne 
program, KWLIST is copied onto UNITOUT. 



